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ABSTRACT 


Performance measurements of a database machine reflect 
оО № the orecessine power of TNE machine, but also tre 
siz> anc structure otf tne database. It is tnerefore useful 
omeeromSsuruct databases for performance neasurements of data- 
basa machines. Furthermore, it is useful to utilize 
syntnetic data, suca that the volume of tne reply Can be 
pwenicted tor a given query and the Structure and attributes 
of tne database can b® varied for intended test queries. 
Conducting measurement studies uSing a syntnetic database 
contributes to tne generality of tne results wnen different 
test queries are employed. A parameterized горем 15 а 
scrioed nerein which can be used to generate various 
Е КОШЕ ог а ѕуптпетіс аатавазе зите experiences in con- 
structing and using tne database generator are déscrivedrt Tt 
s ые ЕБЕ Сеа that given surricient intormation on reeci—world 
databases the generator may бе useful for modeling them as 


well as for creating databases for oencnmark tests. 
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I. BENCEMARKS FOR DATABASE MACHINES 


А. PERFORMANCE MEASUREMENTS 
In comparing datatase management systems  (Dr*Ss) an 
mupertant factor is their performances. One way To Compare 


mU: is CO run specitic applications under am variety ОШ 


systems. Bach system can be “tine-tunea” to give tne best 
result. ûm evaluation based on sucn a meunodels COST y and 
time-consuming. Often such a method may be infeasible. im 


ШЕП 53565, a Castabesiehor tae specafic applicaticns may 
not even exist. As a ѕЅесотатет пос an evaluatbion COLL oe 
made on tne tasis of performance measurements Of existing 
databases. This metnod is less costiw and lessetime-consu- 
mine. However, tne following questions arise. Is ths 
qel lo database sufficient to sUpport intended appilica= 
tions? Are the applications good for condu ting relative 
berformance evaluation of different DBMS S? 

Bis impractical Towpertornzsucenermgeret compsrisonME n 
DeMSSesAdapting an application to s@verdal svstems form еуаі- 
ario родро сеѕ is not ргас тиса г. Evaluation based or 
existing databases is Subject to interpretation error. Tne 
increasing number of DBMSs makes it imperative that some 
method is то ое devised то do comparative perrormance 


measurements. 


10 





В. BENCHMARKING 

me concept of as standard Тот measuring performance” 1: 
not new. The standard is usually known as a dbenerrark, 
тест the markers used by surveyors in estabdl shing a common 
rererence’ point for their measurements. Soreexamples, Mount 
Diablo (a mountain east ot San Francisco) is used as tne 
NE ШЕК се point in surveying muca of Northern California due 
O IOUS ORE rane visibility. A mnetnod for measuring 
Similar items in reference to a standard is called 
benchmarking. 

Precedents for bpencnmarking exist in measuring tne per- 
formance of computer systems. Тае сі 05 Өтен Ж methca 
Measures the execution time ofa specitic set or application 
pregrams for bencnmarking computer systems. The expected 
per. Ormemes ог а system couldeo2 computec by characterizing 
ehe expectet workload as a mix of jobs from the standard 
Ser. 

It is proposed tnat a set of application programs can be 
devised to measure the performance of DEMSS. Usine triese 
bencnmark measurements, it will de possinle to compare t^e 
meriommamce of various L3MS<. Thoas urements бал Е. 


lyzed to Sugeest Streneths and wearcnesses of tne DEMSS. 


СЕР ООА ЕБ TO 3E MEASURED 
The zenerally accepted pertormance index tor a 2B"S 15 


the response time. Defining tne response time as the 


1 





, 


пагу pertornance index is the Scope otf this restarcí, 
However, the response time is based on several factors. 
= these factors are rne time ro process the query, EAS 
s уа ТАЗЕ eMe” to process data, ani tne 
ШЕШЕ Lo return tne darte. For a DBMS running 0n a maintrame 
EOMDUTET, the effects ot other workload on the response 
meme "Must also be considered. 

A measurement of tne response time is more sieritirant 
Шеп measurements of its components are provided. зата 
Simplifying asSumotions may be made. The rirst such assump- 
emon FS that tne rate of accessing data in the database 16 
Constant. Tae second 15 Пагттплеегагемо: тепетпиле “Re 
cessed data is constant. However, tne time involved in the 
ВЕНЕ" of queries and tne time involved in tae proces- 
sine of data may vary greatly amone iatatase operations. It 
order to record the variance of time among tne operations, 
tests must бе devised wnicn will indicate tnese components 
for all supported operations. 

Шие” “716515 focusss om messwrements of “tne T response 
timez A development of a systen to measure components of 
me response time ls discussed. Tne system involves the 
seneration of a synthetic database. The system also mea- 


sures the oenchnarked machine in usine that database. 


г 





E. BENCHMARKING RELATIONAL DATABASE MACHINES 


A. TEE BENCHMARKING ENVIRONMENT 

The research done in support ot this thesis has been 
Bernrmed in a complex environment. The complexity involves 
multiple macnines and multiple operating systems. 

A Relation Generator (RG) of syntnetic relations nas 
been developed usine Pascal (1.е., ІБМ?5 Разса1/У5) іп a 
muitiuser environment (VM/CMS running on I2M 395%). RG is 
used in a Datcn environment (MVS) on the same macrins. Tae 


ВЕ Тото are senerated in ESCDIC=-character form. ТЕ УЕ Е 


y 
4 


Ш 


transported to a UNIVAC 11220 via tape. Tne SECDIC files 


(D 


2 


then loaded onto tne nost (i.e., tne UNIVAC computer) an: 


F 


translated by the host into ASCII riles. These ASCII tilss 
are finally loaded into a bacxend database machine (i.e., 
Britton Lee’s IDM 500). 

ер backend nacnine and interface software Рог tne 1122 
series computers are marxetea by the Amperir Corporation ої 
Chatswortn, California, as tne RDM 110£. Aaciticnal mee- 
surements can be made by bypassing tne part or the query 
БЕТ that provides terminal support. This is  acsSom- 
НЕ aby COMMUNICatine directly with the aquery™== = srocesser 
via compiled language statements (i.e., COBOL). This does 
nO cemUPetely bypass the query Processor, because thewauery 


language is interpreted and cannot be precompiled. Fowever, 
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Meee results snow that query processing does not represent а 
Significant portion of the response time it tne host work- 
ШИП is ligant. Тре terminal nandler represents also a small 
portion of the response time. Therefore, the only advantare 
HUNE use of compiled prozrams is tne option c? runrirs tne 


Process as a background job. 


B. THE ARCHITECTURE OF THE SYSTEM 

fice dmenrtecture of tne system encompasses two major 
areas. The first of theses areas is the internal architec- 
ture of? tne IDM 500, Tne second area le Memos e e ra 
G apo, 1.2., the user interface which runs on the host. 


Пера і с Масліпе Architecture and Various Contiene 
rations 


The IDM 500 is made up of several modules connected 


СО а соттоп hieh-speed bus (See Figure 1). Тое databace 


processor is a 6-mnz, Zilog Z-8¢¢¥ series microprocessor 


which performs tne DBMS functions. Tre батар for rs 


(D 


DNSPODPOCBSSOT iS written largely in tne б ртогстагтияр lar- 
асе, along with some assembly language routines. Le 
comprises about 358 k-bytes of machine code. Ana OD5tlona l 
Ше ыле database accelerdtor Improves tne system per n 
mance oy implementing in nien-speed, Special =purvese 


nasdwoare sone of tne DBMS functions normally performed ov 


the database processor. 





E | 
Exnandable 


| Host рар дыны Саспе 


Interface Processor 
Memory 





HIG STEED BUS 


Database 
Accelerator Disk 
(Optional) Genttrollen: | 





The 
Database 


Figure 1 - The IDM Bus есеби 
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Ins cacne memory is composed of 64x—oit dynamic 8a" 
CHIODOS“ Tne basic contleursuson (at tae pesinnine of тле 
tests) includei one-nalf megatyte of memory. UD TO š 
mesatytes of nemory can be supported. Поре tae Testa 


period, configurations of one and two mezatytes nave elso 


one to four iS оо ег may pes aypan, 
met controller supports up to four six-nundred-mesetytre, 
T1 disks. аре сЕ сао сау De Install Ta NT 
pare Yackinz up and loading data. 

lE standard nost inteni ees ere лус гаре А 
STE byte wide parallel intertace ls avaltable for сол 
кто tO mainframes and minicomputers. A second ibterfiac5 
ШӘП be used to provide multiple RS-252 serial ports to 
microcomputers. A special byte/word interface tor com- 
munication with UNIVAC nost computers is supplied by tne 
Amperit Corporation. 


2. Tne Database Organization 


The IDM 50¢ sofware SuDpOrts the rélational data- 


bas2 model. Data is stored on the disk in two logical 
levels. These lewels ame tne svstemmaamerase ard tne user 
databases. At the top level, the system database contains 


Mie sSysten tables ama vnirieen dawabase tables, Ihe Sten 
aloe sielo n talm»1intformation on nardwaresconfisuretion” data=- 


bas2s and current usage. Tne thirteen  1iata^ase tables 


comprise tae qara udictlonery, ттеу are usea vo 


IC 





ашқан iInfornation abeur relations, attributes, users, ani 
Security. A list of the system tables and the database 
ules isægiyen in Appenmddx A. 

Althoven access to tne system database is required 
Рог tne creation of a user database, an existing user data- 
EES C22 be accessed directiy;, 1ł1.S:, Mitnout going tonrovwen 
the system database. Fach user database nas both database 
Bones and user tables. The datebase tables are stores 
within the user database and may be accessed in tne same 
manner as user tables. 

Mies basic unit of disk access 15 a 2k-byte block, 
When a database is created, a space allocation im 510655 may 
be requested. This allocation may be increased ifr Meses 
sary. Roth system tables and database tables are usec by 


Baer system to compute physical addresses, 


РЕ USEF Interface 


mm GER CE mn لے چس‎ s Qa s q F hw 


Te user interface is accessed oy 1nvyokKkine an pro~ 


Bess on the host. This progessz is amz ratana rive "Quen 
processor. The «query processor parses tne userís queries 
written іп the Relational Cuery Language (ROL). СОЛЕ 


Amperif/s implementation of Britton-Lee's InteLlligert Query 
Language (IOL). Alternetively, queries may te sutmitted to 
Doe güuemy processor from a compiled COBOL or FORTRAN pro~ 


eram. Submitting a compiled program as a batch job, тле 


user may byvass the query processors terminal renaler. 


A 





However, the baten job still depends on the query provessor 
mer Carsing of the query. 

The Relational Query Language (ROL) provites opera- 
mons and facilities Sima to позе Available on 
relational DEMSsS currently running on mainfrare comouters 
and larger ninicomputers. ROL also allows uueries to be 
meee paroed and stored within a database. These stored com- 
manis limit the time required in tne nost for parsing апт 
reduce the time required in the backend tor the database- 
table lookup. Additional information on RQL may te found in 
ШІ, 2 and 2), 

Communication witn the IDM is via a system process, 
RDMIO. RDMIO supervises communications between user proces- 
Ses running оп the nost and the hardware interface to ths 
IDM (See Figure 2). Up to ten users may access tne RIM 


Simultaneously trom a single UNIVAC nost. 





Terminal 









¡(Terminal 


Interface) Interface | 
| 1 


(Parsing, Query 
| Generation, Output 
| Formatting) 


RDMIO 


FIgure 2 - The IDM/User Interface 
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III. TEE BENCHMARKING APPROACH 


A. A MULTI-DIMENSIONAL PROELEM 
malin а cencnmerklng system poses a problem with 
several dimensions. Тһе problem can be broken dowr into two 
EDT areas, Tnese areas are modeling and measurement. 
1. Modeling Probiems 
Tne modeling problems can be categorized as Drvs- 
dependent and database-depenient. The DBMS-dependent 
mae tine problems are related to DBMS schema and syntax. 
The database-dependent problems are related to tne charar- 
bemrstics of the database and tne application to be moceled. 
а. DEMS-tependent Problems 
Tne three widely xnown database models are the 
[eee rarenical, tne network, and tre relational. It ras rsen 
shown that databases and applications based on one of these 
models can бре translated to any other model. However, there 
is no accepted basis for meaningful comparisons of  vneir 
performance measurement. As a first Step, теспе “Gave veer 
ВЕС іп support tor establisnine such a basis for BEMSS 
having the same underlying model, speciticaliy tne relation- 
al nodel. 
б. Database-denendent Problems 
The ~datepase=cependent problems are терге сент 


Per Or erisSting datetases and tne applications ¿unica “are 
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фон тем. HXIStine databases vary in tne complexity end 
ЕП Тһе efficiency in which they nave been implemented. 
Meese Varieties are partiy que to The physical data tnat arg 
mepresented in the database and partly due to the proeram- 
mers” abilities to construct tne database. Anditionally, 
the applications which use these databases also model tre 
peo cal data represented as well as tne  informetion  rs- 
quired of tre database, Thus, both existine datatases ana 
applications must be modeled. Tne key to an effective and 
ит! model is creating one whaden represenms common cna- 
Peeveristics. Тһе characteristicss of databasəs and 
applications nust be carwfully stucied prior to tne desisn 
Due omecmeralwandsetfective model. The contrastinz nature of 
existing databases and tna2ir applications present an ex- 
treme)y complex modeling problem. 
ea sue nent Prcbjiams 

DBMS bencnmark measurements, as a standard, may also 
represent a comparison or UVEMS performance. This “Siding ane 
Ine sitrer avsoluts or relative. Absolute measuremerts 
assumesa i¥xed standard. Relative neasurements теуш усе 
Bamei nese within a group of DBMSs. Tne measurement of tne 
Bsonsezuine tor relative ranging 1S ^ur zoal. 

Experiments. Must юе constructed caretully) and (ae 
оС must bE controlled to provide useable, "eccurare 


measurements. For example, Pa pertormine research for this 


21 





NS nas teen noticed tnat tne load on tne nost сал 
Sienitricantly affect the response time as seen by tna user, 
Similarly, tne response time is ^eavily affected ру tane time 
ВЕТ tO return the de ter to The user at the screens: 
These effects must be minimized in order to obtair measure- 
ШЕШЕ which More accurately ‘гет вест The spertieommance o* the 
backend database machine. Resolution or measurerert  pro- 


Bremse iS discussed in Section Y.B. 


В. RESOLVING THE MODELING PROELEMS 
Altnougn tne modeling problems cannot te eliminated, 
ums сап be taken to minimize rne errors introduced by tae 
mod2ling process. 
1. ШВМ5-йерепбелі Froblers 
Two assumptions can be made to minimize tne DEMS- 
dependent modeling errors. Tnese assumptions concerr tne 
Ше); 05 the data and the operations used to access THe 
data. 
uns first assumptL ion is that all rely; ons ame 
Stored in tnird normel rorm (3NF). Tne use or SNF minimizes 
mss bility of lncomnsistent data. Via Peal Wate eas 
@owrot use SNF, this fact doesn't discourage cur assumption. 
Tn2 benchmark is designed to provide a measurement of TEYSs’ 


performance. It is not Intended to tike iato consideri ion 





the abilities of those persons who will design tre datarasss 


(althouzn ease ofr use may be а consideration in sore 
instances), COr "they Mey по пишешетана tne theory cr ork: 
Tne second assumption to de maae is tnat tne query 
languages used by tne DBMSs are losically equivalent. AS 
Roman differences in суптат do exist, tney generally do not 
ECT tiS breaditn of svallable operations. Tnerefore, 3 
ОО Оп cet of queries cen te implemented in tne  DEMSs 
individual syntaxes and provide tn2= identical Logical re- 


ЕНЕ Amy Variations to this should. be noted with bencnra 


ES 


X 
posults. Tne basic set ot experiments include selections, 
Projections, joins, updates, insertions and deletions. Ad- 
ditionally experiments snould be performed wnicn test tne 
memrormance of any peculiar or powerful operations which а 
DBMS may have in addition to tne standard set. 
КШ а0а ѕе=ељепаеот Frobtlems 

mereilininetion 01 aatvapasezurpendent modeline Wir 
blams involves two fundamental areas. newtirst sot Thess 
Есе 16 rne generation of a synthetic database. Tne gene- 
БОП ОО: such a jatabase allows tne use of data whicn 15 
Remera representative ot eée&isrtin пабе базе, мост 
Specifically representative of any one. The iesien of the 
syntnetic databases cnaracteristics snould ое broad. ase 
Snsumeis brat it can be adapted to realistically measure tae 
PERO Manco” or a database With = ats own “Characteristics. 


These characteristics incluce tne sizes of tne relations (in 





tne number of tuples and tuple lengtn) in tne datarése ena 
tier length of a tuple relative to block size of tne storage 
medium. 

The secona area involving database devuercency 
ши ілес the applications running on tne database. А ера = 
Meetic wormload is required for tne same reascns aS for the 
smtnetic database. Tnes desin of the syntnetic verkloal 
la 05 broad enough to proviie enoush results to be etie 
uui simulate different applications. Times wie oad 725 
meetened wath two major considerations. ine Zırst eye 
pation is support of the basic relational operations 
Tuscussed previously. an additional. Consideration tare 
ШЕ account the varyine access patterns of existing data- 
pases. For example, a given application may repeatedly 
Du oniy one tuple at a time. дает Ill retrieves 
many in one operation. Aa трогает crete 
ту ог tne data retrieved by operations. ¿nie Olaus 
teristic may produce different levels of performance with 


Se erent indexing netnods. 


C. THE SYNTESSIZED DATABASE AND WORKLOAD 
ue USE Of SynrposSi c 

In tetermining a set ot pencnmark measuremerts, іт 

Pome cessary to obtain tne sevownilicn Can be weed on a муе 


ШЫ О DEMoS. lt is also important that 1215. Set 7056 пот 


Maver апт DEMS or class of LBMSS. 
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Two approaches could have been taken in ottainine 
measurements. One approacn would te to perform tests оп 
existing databases. Deesorner amproa>n 15 fO do ressure- 
vents on а synthetic database. The latter allows tne 
кл ее Tlexibility ln pertormine operations on тла  data- 
bass. This is because tne scnaema of a real database mient 
PE provide a suitable structure ror performing a test of 
some operations. Tae schema of a syntnetic database, on tne 
DEN end minimizes any bias resulting from сесіссіпе tne 
tests around a particular database. 

maces research tor (115 (746515 16 реттоттес ine cea. 
junction with evaluation of relational database macaines. 
However, tne installation nas no relational datatases. 
Therefore, any tests on t^e DEMS would nave to be pérrormed 
On 2ither a synthetic database or a database converted fror 
anotner model. since the USE 27 совер: о патасазвев асшы 
ports amore general approacn in benchmarkings, the choice 
has been made to generate sucn databases for bencnmarxing 
tests. 

pes Of Synthesized Data 

Syntaesized data snoula nave one major characterist—- 
jo Weer types бг data SOLIS Se ооа enou n то Тест ПЕ 
ВОР. DEMS operations of tifrerent tynes ог (7787805 
(i.e., values). For example, in tne research performed for 


His tneeıs, tne First two attribute values of ea? relation 





пате ume same numeric value. HOVEVET, The £irst attriovte 
value iS Stored aS an integer and the second as a ^haracter 
erring. Jne set of tests selects tuples based on tne inte- 
ВУ АННЕ, а second set of tesis selects tne same tuples 
based on character values, Resporse times may be аѓғестез 
EE se processing dirferences related to tne сата тузеѕ. 
eniti onal differences may result rrom tne time requires Ttc 
Ea ae data TOT output. 


Schena OS ON Sa 


pa 
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ata Used 

Tne syntnesized data used for tnis thesis nas four 
peer Of relations. Pach set nas several relations witn 
different rumbsrs of tuples. Bach relation in a sət nas the 
Sere attributes. TAE Attisituves dre е ии ааа ИИ 
sans, aiffering only in number and length in order tO pro~ 
НЕР ad тапгсе of tuple lengtns. Table 1 snows tne range of 
euer eharacteristics. 

Tne relations are stored in several databases. А 
databases are used for testing sinegle-relation operations. 
The first database contairs all of the relations usea in 
Single relation testing. The second database contains rela- 
tions wnose tuples are of 120 bytes and 200 bytes. TRIS 
datapase uses compressed fields for strings (i.e., trailing 
blanks are dropped) . Several databases are used t^ provide 
e TIONS for testing join operations. For testing, Siw as 
desirable to spreaa the join operations over the two disks 


in tne system. A full implementation cr tnais cesirable 
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guo. 


Tuple Lenetns 


Relation Losical Sizes 


Relation Physical Sizes 


Attributes 


Attribute Types 


Тһе Reletiion Grarecteristics 


172, 202, 1006, 2002 Bvtes 


500, 1020, 2500, 5400, 10020 Тіріг5 


54 xilobytes to 24 megabytes 


14 ( for 112 cyte tuples 024 SHOP 
otner tuple lengtns) 


Sequential Integer, Random Integer, 
Сота гез Arana numeric, Blocks Gere 


Zn 





ШЕЕ бабе placement 15 пот possible, because tne storace 
ato ca tion alsoritams prevent us from controlling over тле 


ese location of specific relations. 


^" 


tC 
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IV. GENERATING SYNTESSIZSD DAT: 


А. A PARAMETERIZED RELATION GENERATOR 
The Relation Generator (RG) is a parameterizea program 
ШОТ seeneratinse relations tor a database. mice OT Of = 
user concerning tne characteristics of a relation. Эз 
ШИЕ US=r is instructed to enter tne relaticn name and size 
(i.2., the number ot tuples). Then, the program requests 
TS apout each attribute. Tne data euuestea includes 
attribute name, value type (1.8., integer, String, etc.) ani 
distribution of tne attribute values. Tne relations gere- 
ШЕП ате stored in ASCII files to simplify transter between 
Systems. 
Poco litt es 
RG contains routines to generate seuuential nuroers, 
random numbers (eitner uniquely or nonuniqueiy), апа спа- 
maeper strings in collated order (See Appendix B). The user 
ШЕ 31505 specify a flle.ynrcn contains a set o? veluses - or 
ап( attribute to be used in #enerating attribute values, 
This set is called a “value-set” and tne tile is called a 
EEE tile. Iv 15 produced By TRE ВБ т Бу pro s 
Value-set Generator (described below). Tne actual range or 
ре TON the file to ne used ГОТ ап attribute is cade 
the attribute Ss 10main. The user specities tne number ot 


values fron the value-set to be includea in tre attrinute's 


ao 





ШЕШІП. It is not necessary that the domein contain all the 
values in the value-set. RG reyuires tne user to define lia 
Peco rloOuUtior of tne attribute velues. Tne distritution 165 
either in discrete blocks or random or both. A discrete 
КО ОГОО топ in which tne attribute values are randomly dis- 
tributed May be created by sorting a relation containine 
EN crete blocks on e random number attriture. 
a. The Developmert Snvironment 

RG 15 written in IBM Pascal/VS, running under 
the TM/CMS operating system. VM/CMS is an interactive, 
Multiuser operating system. Веасацѕе ої оретатіпе system 
limitations, RG has been convertea to a MVS (tatcn) process. 
Standard Pascal syntax nas been utilized as mucn as possi- 
ble. Pascal/VS extensions to tne language nave teen used. 
wona ly, some of the file descriptor intormation 15 
Speeitic to tne operating systems. 

b. The Development Process 

пензе те гер ти one development ot tne system 
is . the drat*tine of a modular framework. Persons are than 
e eea To develop tre different nodules of the program. 
n егеп т тоспіев incinie tne main program, | (hewmen 
seenerator module and t^e individual value-type enetatcr 
MOC ULES. bue individual mogqulesmproduceTmspecif re pr D 


values for the attributes. 





The SyStem has been developed usine modern sott- 
ware engineering tecnrniuues. The different modules nave 
Geen debugeed separately. Braermamshearnesges, Walch conta п 
no logic except to invoke e procedure, Nave beer used te 
best procedures and subprocedurss. Module, “STU 0S wwii) cul 
Simulate tne actions usually performed by procedures, nave 
Been used in place or tne procedures to tesi tie mair Dro 
eram and tae main generator module. once de^ugged, tne 
Memes Nave been int@erated witn tne main program. 

The responsibillty tor ecenepatineo relations na: 
been assigned to one person. AUN Lona il етее иот = 
system involved several items in addition to debugeine. A 
utility to generate value-set files nas also been created. 
uns tne otner members of the team have been freec Lo worz 
mm OTe phases of the project. 

с. Design Problems 

Two major problems have been encountered in the 
preparation of RG. Tne Тігест гово те о 
relations to be generated. In the orieinal RG desi en mall 
Micaela lists of attribute values reside in tne primes 
meememory Sinultaneously. Tne size of the lérgest relation 
that has been generated is twenty megabytes. TniszrRegunTres 
tw2nty megabytes of tne virtual memory space just to store 
СО of the lists. fddi tional space оша be ees 
ое for the program and the overhead associated Wits 


ШЕЛЕРІ ists (:.е., pointers to memory  Locatiogs). Tnis 





exceeds the virtual memory space available to a single user 
under VM/CMS. 

Inis problemn nas been partially solved hy aeeces- 
ime Sequential files as a substitute for tne linxcea tists. 
Therefore only one list of attribute values ata tire is 
cea in the primary memory. However, а lirxed list nt 
some of the longer attributes generated reuvires over two 
Dues of memory just for tne data, würtnoot cemecicerine 
thə space required for pointers. 

Tne second problem concerns tne transportation 
of the files ofr generated relations to anotner system. 
Under tne TM/TMS system at the Naval Postgraduate School, 
Seer user 15 allowed a Limited amount of tile space. Tnis 
amount is much too small to hold most of tne relatiors gene- 
rated. Additional space is available on a temporary (i.e., 
one-day) basis. Also important is the fact that «niie 
VM/CMS files can be offloaded to tape, thay are stored оп 
Maneras Non-standard format. There’ is no utility proerem 
Bostranstrer VM/CMS files to tape in standard format. There 
ed ПО utility program to exrenenee tiles = Tetween The 
tapes of VM/CMS format and tne tapes oft MUS format. 

It rs"dppdrent^vbat ҮМ ОНБЕШ е ш (18 10-412 pe 
ПИЕТЕ И in which to run the system. Therefore. it nas 


heen necessary to convert the system to run in the “VS en- 
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vironment. Tne MVS system writes tapes in the 





о ІІ 31560 allows rhezuser to Have a much lir? T 


virtual memory space. InaneUPospect- 1t Mawes Sen 


in 


e to 
develop the system in an interactive system (i.e... TV/CYMSN\, 
Fast turnaround contributes to faster program Jdeveloprent, 


and the interactive environment makes debugging easier. 


B. A MATRIX OF RELATIONS 
Ine relations generated by RG are designed to support 
Dp ments ov2r a range or fewWation sizes ала спегастетіѕ- 
pees. These sizes and characteristics are selected to allow 
maximum flexibility in pursuing experiments witn a minimal 
Buuper cof relations in tne test database, Tne parameters 
testa below are wweose of the relations produced in SUD 
ШОТТ ОТ the benchmarking. 
1. Standard Templates 
All of the relations are cnaracterizea by tne same 
general templates. THiS template is лок In Table 2. z 
Specific tenplates are derived from tne general one. These 
еее correspond to tne tour tuple l@ngtns usec for 
testine (i.2., 104 bytes, 200 bytes, 1000 tytes and  2V&4 
bytes). Bach template is used to generate tne relations cf 
various sizes (52 - 128,466 tuples). Thus most of tne tests 
Seow ver run оп many relations by спапсіле only tne relation 


name (or the values of tne range variable) in tne queriés. 





PS 
(D 


MHTTOTI 


Random 


Random Unique 


Cou lated 


Letter 


Si 
(D 


ПОТ used 


% 


nlo e. Майы Еа Ore Leto ate 


= a SeuMSmiblg]dmumbET tossMestloPed es an ir- 
серет тен 


- à sequential number (same as xev) to te 
stored as a cnaracter strine 


а гот: пе ро CE 5 тозе E SD INTE ET 
meld 


- a unique random number to be stored as ar 
integer fiela 


- a character string оре STOTE ит еее Ы 
der 


= dd Panda o Mia ipna ap о 


= olocks of values from value-set tiles. 


in some templates 


В multiple attributes depending on tne tuple lenetn 
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Xi odility 


r3 
‚5 


штеатопзшатеә лат c providelgnlexvxipility in 
testing. Ideally the tests to be performed are Known before 
designing the relations. However, the results frcm some or 
иест may suggest a need for additional tests whicn dave 
not been previously considered. Aecordinely, tha relations 
BIS deslened to allow the desien of additional tests without 


BemeraTine nore relations. 


Che HS GENERATING PROCEDURE 

emera ting procedure consists of tares phases. DS 
МЕНЕ: phase consists of designine experiments and the Tala- 
Moms CO ve used in tnose experiments. атте tre Pela tone 
ue been desilered, they must be created and tTraensnort>?r t5 
mre testing environment. 

Generating relations is a Simple process. First Veras 
used to generate any necessary value-set tiles. Then, RG is 
used to generate relations.  RG nas been expanded to proluce 
ВЕСТ) оп flle. Tels tile contains тле атт оц ре mamas 
mrmmectanmac teristics of tne attribute valves in tne relation.: 
Mre aescription lists both the format of tne generated TIIe 
EE Орта от tne relation 8а< 777 15 to Je stored cm 
database. 

1. The Generator Systen 

The ¿generator system Consists ot two Major programs, 


the Relation Generator (R35) and the Value-set Generator 





er). Üther programs and debugging alds may be necessary, 
dependine on the environment(s) in which the system is 
implemented. 

a. The Relation Generator (RG) 

RG “ЖОЕратее а теа опт ое based on lapur en 
вне user. It consists of four types of modules: the main 
Uan, tae main generator module, tne Individual generator 
meeules, and tne collating moaule. 

The Main Module - Tne main RG module contains 
very simpl> logic. RG prompts tne user for the cnaracteri- 
Stics of the relation being generated. First, tne name ani 
size (in tuples) of tne relation is requested. Phen. Бүре 
user is asked to determine tne cnaracteristics of the first 
attribute. ine attribute cnhammori ns rorcol есес т 
an attribute record (Ses Table 3). After tne moduie obtains 
ри c sSsarv attritute cnaracterist1lcs, it invogzes tne mera 
generator module. 

The main generator module, as explned inthe next 
Section produces linxed lists of attribute values and  re- 
[ums to the main Ey module. RG then invo&Kes tne collate 
module which is detailed in tne sequel. Іле соает тоне 
НЕСЕ tuples by concatenatine sets of attrioute values. 
AMS Tte relation nas been generated, tne user is giver the 
Оо Of renerating another relation or endine the process. 

The Main Generator Module - The main generator 


ое плед tó produce each sev 907 attribute Values: 
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Ате bute Name 
Attribute Type 
String Lengta 
Lower Bound 


Upper Bound 
generate Mode 

Value Set Name 
Pelative Proportions 


Seed 


asismtiemee abri tute name 

data type or attribute values 
used for StTing types 

first sequential integer anc lo” 
wer bound tor random integers 
upper bound оп ranqom тстевете 
data -type Ces vr bout) or 
value-Set rile name 
duascrebreodistmib5urlongspsetii 
car Ton 

randomziuntesers 





ene characteristics of an attribute are passed to tne roarile 


(D 


ie an attribute record. Using this record, tne main тоси] 


invokes one of several individual generator modules, ле 


қо 

(n 

eq 

I 


tire оп tne characteristics of the attribute. TR 


(D 


ШЕК у dual generator module produces a linked list of attri- 
Шс values wilh tne aesired type and distribution, and 
perurns tne list to tne main generator module. Tahaa: main 
AaOr module opens 3» Sequential tile, writes tne atīri- 
Mi ames Into tne file, closes tne fiie, and returns to 
the main RG module. There are tnerefore several Sucn tiles, 
known as attribute files, 

Collate Module — TAS" collate module ac Simca 
collator., [t paysically cóoncaterates strinzsTof attornu: 
о form a tuple.» It is invosed to assimilate all tne 
cimi bute values in tns attribute files into a fiie of tne 
вез афтол. ілеоттатіот есг елена ое pestis passen 
0 (ПБ collator aS an array OF attribute records. The 
Eert Er irst opens tne relation file, and all ле attrir 
Ente files. Тле тешаліот 2565 Еешешанет a tuple atta Time: 
Ее attribute value rron sach file is read. The values are 
нова тей то рговчсе а тіріе. The tuple 15% ает ттеп 
(ШЕ relation file. tne collator repeats tiis D ean: 
Met tae tuples naye been produced. 

b. The Yalue-set Generator (VG) 


The Value-set Generator (VG) is a simple utility 


(D 


Боп бетсіле up yvalue~set tiles for 2C. VG asks tor the reame 
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and size (i.e., tne number of values) of tne velve-<cet file 


to бе created. The values are entered individually ant 


aj 


stored as strings in a random—access tile for use BY RG. 


2. Ihe Conversion Problem 

Converting themipmoeramete= Tun in tae Daten ea 
Remment involves several tasks. Tnese are tne conversion оғ 
емес сае programs to batch programs, the submission of 
NO tne batch system,  an@ development of tre adcitionel 
Statements required to use of tne batch tile system. ا‎ 
tnousn tae programs nad already been debugged in tnae "M/OCMS 
environment, extensive debuesine has been necessary after 
conversion to “VS. 

Conversion of  prowrams from VM/COMS to WS is 
mora simple process. A virtual card deck ís created ín a 
OP CMS Tile which contains The Source deck, the input date 
and the file data required py tne “VS System. This ritenis 
subnitted to tne batcn aueue. The input for RG ([i.e., tne 
user’s replies) are in the cari deck witn the proerar. 

Artnouen “(Т 15 nor лесе еы THE CLICE EE 
wmmcnmmceneratrec the instructions to tne user for tne input 
ES DEEN removed for tne MVS versions. The VM/CMS version 
has beer modified to create a file wnicn contains tne user's 
responses to tne program's prompts. 

Differences between the Datca and interactive 


Systems CUS tne ditriculvy im proeradm conversion. ve 





Batch System, MVS, requires much more in tre way cf file 
K meter specifications, and is much less foreivine wnen 
Bor conditions exist. There ars some error conditions 
wnicn tne user can not foresee. For example, the system may 
initially allocate space for a relation file on a volume 
wnicn does not have enough free space to cover secondary 
locations. (hen $115 happens rhe proeram iS  eborted. 
ШӘМЕЛЕТ, it is not possible for the user to specity a varti- 
rar ecisr (i.e., one with sufficient space) for #115 
Storage. For the two Largest relation tiles (fifteen and 
twenty megabytes), it nas been necessary to write sach ої 
the relations into two separate files on tne baten system. 
The two filles were then combined when they loaced into tae 
database. 
as porntine rns Relations торе тасу реп 
a. Transporting the Lata to tne Eost 

The” Transportation com ПЕ тат ооо Е 
is a two=step process. The first step Sethe Granst steno. 
tne relation files fron tne MVS secondary storage to tape. 
As tem utility is used to accomplish this. The tapes ars 
ШЕП Ttransportea to rne nost, tne UNIVAC 1120, and a similar 
ШРІШІСУ program is uSed to load łata into the nost secondary 


CLIC 


taj 
tad 


storage. The nost utllity progran translates tre 
ИЗБА into ASCII disk files. 
b. Loading Data Into the Backend 


Tne relations are loaded into tre dacxend usine 
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ando r=supplied utility Called a translator. This utility 
s Sure user ror íntormation atout tae source rile, tne 
target database, and tone target relation 

Thewtranslator utility may be run interactively 
er with file input. Тһе database into wnich tne relation is 
to be loaded must already exist. ое Во wn em 
ШӘТа IS loaded may or mav not exist. Database iane, HOST 
mies namê, and relation name must be Supplied. Additional- 
ШЕ Тот each attribute the attribute name, Tensth of source 
(in ASCII characters), and type orf value to be Stored in tne 


database must be supplied. 





V. GENERATING TEST PROGRAMS 


I TEB TEST PLAN 

е Mmeceneral test plan calls tor several difterent types 
of experiments. Among tnese are experiments invoivine only 
one relation (i.e., selections and projections) and ехрегі- 
ments involving more than one database (i.e., joins). 

1. Sxperinents Involving a Single Relation 

The selection and projection experiments are de- 
signed to measure tne system's performance in retrieving 
ШЕ from a single relation. The response times measur eg 
hne the sum of four variables: tne time to process a query, 
Meme. со access tne data, tne tine to process the (дата, 
and the time to return tne data. The time to process the 
query is defined as tne time to parse the query. Bv Cares 
NNNM Constructing sets of experiments, Ztnese verliekles can 
Demos timated. 

Билсе the time to process a query is So small. ir 
may oe ignored or combined with overnead for most experi- 
ments. POT experiments where itis  Sieniticaat Уе 
Ia proccessine time is minimizeac to prevent it from domi- 
nating tne tine measurement. Re sel Gel te ом о ао 
precision. Tre RDM 1122 allows tne parse tree of a query to 
be stored in tre database. This capability allows tne 


replacement or the processing time, which 15 dependent ог 


c2 





the host, with tne data access time, which is depencent oniv 
on the backend. The additional data access time is tha tire 


Bo access tne comnana іп storage, This is the same for 


(1: 


all stored commands. 

Tne largest variables are tne easiest to measure 
with precision. Theretore, they are measured first and tren 
eliminated to measure the smaller variables. 

The largest variables are likely то be tnose  repre- 
eins lhe time to access, process and return data. These 
Sem ve Measured with simple retrieve commands. AS tame 
ЕОс of a retrieve which returns all the attribute 
Це of the tuples in a relation includes tne times of ail 
of tne four variables. However, a time measurement uslne an 
aggregate function (e.2., count, which returns a single 
count of tne tuples meeting tne qualifications of tae auerv) 
eliminates tne tine to return the data. Tanes this run t on 
сап be used effectively to measure the time to access ana 
process the data (tuples), 1.2., Two of tne four variables. 

ош лег, ah assumption iS mide that Тот стресс 
Шеше TM. processor can process data at a Fate which %5 
Мет TAN the Trate that data can be pbroueaent into Tre 
memory for processing. Tnis allows tne processing time to 
be iznoọorsd. Therefore, the measurements reduce to a measure 


of the access time. 
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Bawing quantified the lareer variables, the time ts 
process data may be investigated. It nas been assumec tnat 
me processing time is not sienificant for simple commands. 
However, if tne commands are made more complex, then тле 
Processing time is expected to increase, Wwitn a sufticient- 
ly complex command wnich involves a small data-access time, 
EU processing tine nay become significant. Therefore, 
experiments are conducted which minimize data arcsss put 
macy in complexity. It is of interest to deternire wnen or 
it the processing time becores measureable and Sieniticant. 

It is expected that projections operations will 
rese the processing time. Tnerefore, several tests are 
шесрорттате tor testine projections. The first set of tests 
MN ae effect of projections on tee processies Tre. 
Tne second set checks to see if tne processins time is 
affected ny tne typs(s) of attrioute values projectec (i.e., 
ИПТе ег, stríne). Tne third set of tests measures тве 
performance of a projection on all of the attrinutes versus 
a simple ‘retrieve ali” command. 

After tne time basic variables nave been estimate, 
Wee рот осттапсе factors are investigated. чес ер я 
indices can recuce access time. Ey Tecucine the атоо 
ise OReuent into tne memory, the processing tame is aiso 
ке се me rOwevVer, the processing time will Se ipcrsassed he 
Ех 300Е55 апа search. Тлегетотеу торе 


tne use ot indices may increase the ressorse time. Indexing 
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Nes a specific set of tests to measure its  »ertorran-^e 
ncc various situations. Иене ЕО irene, TDS of NEN 
Mises (1.2., clustered, non=cinstered, multiple xreys, etc.) 
mest also be investigated. ‘GME X pee Ged табор ПЕ 
meer Ormance is the ratio of the index size (ir blocks of 
МОТА) to that of tne relation. 

String compression (removal of trailing spaces‘ is a 
стог Which can affect the processife time, the access time 
Eu t^e return time. Tne use Of “Compression сал "есе 
block storage dramatically. tais; in turn, reductos are 
Eur ss time. Eowever, it may require nore time To process a 
compressed string versus a non—compressed one, if processing 
HU gres expansion of the compressed attribute. ¡EA = 
sion is not required for processinz, then tne host may nave 
to expand it tor proper formatting. How expensive (in time) 
ЖІ (1152 Does this compensate tor tne reduction in tne 
responnse time resulting trom returning a smaller amcunt of 
data (tne compressed string) to the nost?. 

И пег performance factors may be examined “eiltner 
n ШО ал Ту or within other test orocedures. ап етеррте 27 
ШЕ 15 “tne use of dirferent types of ~aAttrisutes (i zen. 
integer versus string). A complete series of tests can te 
Шеваи ораса to test this issue in detail. Eowever, it 15° alse 
БЕЛОЕ то investigate tris area in conjunction Setter 


рес в- time end projections. 





Operations involving more tnan one relation (1.е., 
joins) are affected бу the Same time variables as those 
O only a single relation. Initial testing snould 
involve only two relations. 

It is expectea that the access time will become 
Ші пал: tor join operations. This iS because tne Same чата 
ИЕЛЕ 70 be accessed repeatedly. Memory size ras an 
БСТ on the amount of accessing required in 8 join opere- 
pron. Ir memory size is 1]агге enouen to allow toth 
Bons то be accessed once and left in tne memory, then 
meee processing time may become significant. In tnis circur- 
Stance both tne access time and tne processing time ara 
Ae tea to increase proportionally to tne relation <176. 


The unknown factor is the rate at which the processing time 


increases. However, it may be tnat neitner relation is 
ШЕН лоп то ве пеій іп tne memory tor nräcessire. In 
tnis case much accessing must be performed. Іт may also be 
Of interest t9 examine join performance between tnese two 


extremes. 

РАО. зело! be cesianed то таке ecvantage of sayy 
Size differential between the two relations. Ir the sraller 
ЕАС Сап De completelv neid in tne memory, then ito ca 
be  ac^essed once and brought into tne memory. The larger 
реа осад also бе accessed just once as it 15 orousent 


AO Te memory as a stream. Ir, оп the orie аа E 


+0 





lEPseT relation is brought into the memory, it must nes 
meournt into the memory a portion at a time. ale smarter 
relation may have to be reaccessed for each portion of tne 
Manger relation. 

It is important to examine tne performance of joins 
meee Bit no and witnout selection. [п pertormine tnese tests, 
IS Tra tezy of the operations should be examined careruliy. 
Hee  Setection should be performed before the actual join 
СТОП to minimize tne volume ot data being joined. 

Ánotner area of interest is the effect ог index 
usage on joins. Perro Tr ma еее троп. 
indicated by tne single relation index experiments. Fowever 
the specific results may suggest tne efficiency with wnicn 
toe join operation nas been implemented. 

MM Sauablity Jolns nave been in plemented, ога 
mance testing snould oe conducted using them. If they have 
Home been inplenentea, it may be valuable to know if, and 
қап маат ФІТТІІістпіту, they can be simulated. 

Dues есрегітетсес тое JOAO Pera ios Io 
Ne a tions, zexperiments operations saoula_ be. солацгцев 
Wee lareer nunbers of relations іп опе join operation, Еу 
пе rne performance on multiple join relations іт 
DEC 50ss'ole to isolate a rixed overnead Tor all. vhs 


A a 40105. 





o. A Flexible Test Plan 

A sereral test mlanwsnoula be dewedopedmrefore any 
or tne experiments are desıened. [t snould DE SIE TL TS 
ols testing to follow different patns of discovery. IT 
ЖЕЕВЕСТЕП that tne results of some experiments may suggest 
other experiments. Time must he alloted for tne expansion 
апу Тест сег. 

Eowever, it must also ensure that tne 4 surricient 
fees Of deta iS OXtained. The tests must cover tne univer- 
sal operations (i.9., tnose expected ot any DEMS). Among 


the universal operations, known bottienecxs and treakvoints 


ЕЗ 


ЕНІ 5зесітісаііу tested. it Should also inves tie so Жат 
Specific strengths, weaknesses or idiosyncrasies of ths 


DRMS. 


В. MEASUREMENT TOOLS 

ИЛЕ еесроп<е-тіте measurements in 11ese experiments were 
режеп Гром the dDackKend-macnine clock. This cipes “has twa 
ОССО ТУЫП of 1/60 second and an accuracy within 1/59-tn of 
ВИ СОШО и The response time ot the backend machine on smell 
relations is dominated by communications overnead. The 
ШЕШТІ Тоспопсе vime is about one second. 39, Of tne tests 
ВОЕН. tne 179%-сесота Pave rv ales nro Perm y 


accurate, 





however, if the overnead can be reduced, a more 
Measuring device is required. Most maintrame operating 
s Sms provide eae clock with a resolution in microseconds. 


tois is not available in tne tackend machine. 


С. QU*RY SCRIPTS VERSUS PROGRAMS 

Iwo methods exist tor performing benchmark experiments. 
Inese metnods involve the use of query scripts arc programs. 
Ene First of tnese simulates an int@ractive session acces- 
Sing the Затавазе. Tne actual terminal input is prepared 
anead of tine and stored іп а “run-stream” file, «known as a 
query script. The host operating system can ce instructed 
to obtain its input fron a tile instead of via tne terminal. 
Thus a series of tests can be collected togetner in a 
Soir) pt. аташы орадіу tae omtput cam be redirected то а 
оек memoevine tne отегӣеаа An communicating WIL с 
terminal. 

Tne use oft batch progzrams involves much more of t^e 


programmers time in the development and cebuzgirg of ters 


(b 


program. Development of batch programs also represent 
larger drain on tne nest’s resources. Tais factor could 
К p а112сТт testing at many installations. 

Since queries must te interpreted wretner tney come ггот 
О Шор от а Scriot, tne use of taten ргавтапттлеа cid 


not “€rrTrer the advantages of bypassing tne query processor. 


Anererore, There is same question wneiner or not a батса 





program would provide superior performance results. Tais 
won and tne ease or development of query sreridts 
mueesest that the use of query scripts is the desired metnoc. 
If baten programming offers a significant performance im- 
provement, additional testing must be performed using tetcn 
jobs. fers it would be wise to run a complete battery of 
mests in tne interactive environment, followed ov 3 subset 
messe tests in the oatch snvironment. This suoset should 
ШЕ 26515162 to test areas where the batch process may Rave 


its most impact (i.e., tne data return time). 


De PNESAPRETING THE DATA 
Wet terporetation of date is a very important al 
the testing pnase. There are two reasons for Wars. Кге, 
conclusions cannot be drawn from raw data. second, Timely 
Meter oGetation enables the persons conducting the experi- 
ments to analyze tne results and identify furtner testing. 
cei SECTION 07 raw data is very nerd to ее 


Theretore, any results obtained srould ve erapned immediate- 


iy Graphing tne results immediately allows n pi 
ЗОО ГЛ Са оп ot errors and unexpected resuits. Relates 
ASIS 5 лопа 3150 be erapnea together. For example, all 


К ЕС РГО a query appiiel to relations of а ее 
lle. lemetn and relation size should be vrdpned toz]=tner. 
Once tne Taw data is analyzed, tne grapns may be rer” 


леа, ine =гарп атхте< may be varied as appropriate, Por 





exemple, tne response time may be graphed against tne tuple 
Іепетп, against tne relation size (in tuples or the numter 
of blocks of tne storage space occupied) and against tne 


Mmanbity of data returned to tne user. 





ҮІ. CONCLUSIONS 


RESULTS 
Шие! resuitts obtainea from testins several confisurations 

of a relational databases machine have provided a basis for 
mevwetoping a general set of benchmarx tests for relational 
databasə machines. Tne bencnmarking tests nave been mostly 
sine independent. Altrougn a testing metrodolosy is 
provided herein with enough results on certain confleurs- 
Pets, 31ditional]l testing is necessary. ThiS tesStine Snould 
K Ormeod on otaer DEMSs,  preteraoly wich different ena 
racteristics, to ensure that tne test is complete and not 
a nine-speciflc. The  Ppesuits. Of testine с ето ат 
projection operations are descrited in {4]. Resuits trom 
performing tests on join operations are describec in [5]. 

The response time has been sh wn to be propertioral 
to the time required to access the data. This, in turn, nes 
Hunc shown to be proportional to pnysical size of tre data 
Pac. Methode used to reduce ire anount o cates о = 
brouznt into tne nenory tor processing (suca as inaexing ana 
string compression) improve tne response time. 

The response time is also proportional to the amount 
Га” eturned to tne user. In tne case of ThE PELT OOF 


tho time required to return tne tata is tne largest 





Component of tne total response tine. Голые лесе е тү 
information 15 obtained via aggregate runctions, tne re- 
EDOHse time is greatly improved. It is ОБ Тообаа Ио 
determine now much or the response time iS due to tne racz- 
end таспіпе ant now much is due to tne host. Eowever, 
loading tne nost definitely degrades tne response rime. An 
ШООШУ515 or the response tire under various Load conditions 
де ost may Lead to a distinction of тле nost response 
BEES yS. the bacx@€nd response time. 

пле tine required to process queries and tne tire 
БЕПІЗТГЕП to process data in the memory are relatively smell 
more the RDM 1122. This may not be true tor other SySters. 
Therefore, it is imperative tnat these areas бе carefully 
examinei when adapting the proposed tests to systems witn 
ветро arcoitectures. 

The results or the experiments show that 09455 90 
have cnaracteristics wnicn may be measured. А well =cor> 
Bemved Series of tests can measure ап installations 


п 


t3 


performance, and gain an indication cf its performance a 


its “personality.” Tnese tests can be used to compare D3MS 


л 


атай ШӘП each other., Por Тір ПЕ пә етватог, ІЛ 6555 
SISO Ро a nethod of determining pooriy implemenves 


parts of tne system. 
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Tne experiments which nave been performed nave 
EHNEported two different types of study. Ine first is tne 
actual measurement of tne backend macnine’s performance 
(albeit, witn lisnt Load and few contigurations). те DA 
1100 provides a conprenensive (altnougn uncomplete) rela- 
Demat model which successtully ottloads DEMS tasks trom tne 
nost. Since evaluation otf the macanine was Conducted simul- 
ПЭШ БоБ [у with the research, tne task ot evaluatine it nas 
been accomplisned. Some areas tnat have not teen fully 
investigated are due to tne lack of time. Other areas tnat 
have not been fully investogated are due to incomplete 
implementation. As an example of these areas, tne use or ALL 
mamma rheves s< is contigent Upon tne number of attritutes. 
At one point, tne use of ALL on a relation with a large 
number of attributes results in only an error messages. After 
merallatıon of rine accelerator, the use of ALL nalts the 
command. After tne accelerator is removed, the Зргоотелиоя 
"ns persists. Another deticiency noted has been rne 


moDIIULY то perform an inequality join. 


B. A RELATIONAL BENCHMARKING METRODOLOGY 


I proposed set 01 bencenmearzztests Nas four en 


(D 


ОЗЕК iret phase consists or preliminary tests designo mo 
identify tne best method of measuring tne system's response 


time. Тре second pnase involves isolating tne different 





Somaonents of the response time. The third phase inyesti- 
zes the System response in specific areas. Tine four 
Bnases verifies the results Obtained durine the nhases two 
and three. 

Most systems have at least one mecnanism wnicn pro- 
vizes a time measurement. Initial testing is designed te 
identify the one which optimize tne precision ottained ver- 
sus the ease of odtaining tnat time. Once the measurerent 
metanol nas been chosen, it is cnecked to ensure tnat it is 
UTILS enouen to provide the necessary precision. LESS 
also necessary to ensure that tne overnsaco involved in 
perrievine the time does not reduce the precision of rne 
neasurements being taker. 

Iit the necessary precision iS not Teadily а 
then techniyues are availatle to increase tie precision of 
The results. These tecnniques involve pert orming апо реге 
tion several times and calculating an average. Ine tech 
niques selected must be reviewed for side effects. Tne DEMS 
DI Lave the capability of. internaliy optimzing performarces 
Iur Eerample, tne order in which rne queries are submitted 70 
the DBMS may allow tne DEMS cacne memory management to 
reguce disk access. 

ШІ the case or the RUM 1148, two diffterentimernods 
Сша пРас иі пе тіте coulc nave been used. Tne first methoc 16 


MOS tao a rime stamp from the nost Operating system. 


ӘӘ 





matiougnh itl мау nave provided sufficient precision, it nas 
ШІ: been investigated песап<е_ бт the ornes METRICS 
available. The second method iS a time Stamp available rrom 
pne IDM. A built-in tunction supplies an elapsed ІШЕ 
Ecurement intervals of One-Sixtietn ot a second. This 
provides sufficient precision for tne neasurements. эе 


Dec elapsed time is a sufficient measwremert, tne more 


precise measurenent nas not teen usec. 
ee ase Two = Component Isolation 


Once an adequate method tor measuring time has been 
red, iit is used to néasure tne performance in several 
Specific areas. Thess areas are tne four components which 
Nuuruvolved in ali queries: tne time of process tne query 
(i.2., parse it), tne time to access the data in the 2ata- 
ЕЕЕ, the time to process the cata in tne memory, and tas 
mane FO. return the requested. Thesen components May TRE 
considered tne DBMS’S primitive operations. These primi- 
со пот take advantage of any metnods used to Improve 
the response time of a given query. They merely measure tne 
Beet ormamce of tne hardware and Software in merricrmine Spe=- 
ЕС functions. It nas been stated that a performance 
ЕТС Of some aspects of a DBMS is realiya mesure- 
mert of the operatine system. The operating system aoes 
EEES ЕИБ response. However, in tne case of a  Dackenc 


machine, this eftect is minimal for Some operations. nil 





mms Issue may be debated, it is not of interest to tne 
user. The user is not interested in the reasons wny а 
Stam responds poorly. He is interested only in tne fact 
moto d4 System pertorms properly апай лев: аст Met Une SY 
stems performance is better (or worse) tnan trat of anotner 
system. Не is most interested ín tne possibility of obtei- 
Bene sa quicker response time on nis application. 

Tne system primitives are measured ty a set of 
queries which isolate different aspects of tne response 
tine. One set of averies is désigne@ to retuem tne seme 
mourt of data from relations with the same number of tu- 
ples, Em nawicwditferenuv tubpie sizes. Once a тире ле 11 
the memory, it takes the same amount of time to project ons 
sium Dune from a set of 140@=—byte tuples as from a set of 
2000-byte tuples. The difference in the response time for 
eee wo queries is due only to the tine necessary to tring 
ШИЕ tüpte into the memory. The times required to process 
ҒӘР”, to process the dara and to reiurn пел ште 
me same. 

Meme second set of queries is desizned to measure тае 
time required to return the data to the user. These queries 
return a different amount of data (in bytes) from prejection 
aos оп tne same number of attributes in tee “same 


олпат (Ше., strings, etc.) in relations watch are of mane 





seme pnysical size. Ее све опе ао ЕЕ 
CEES time is the same, the processing time is the same, 
mea tne query processing time is tne same. 

The third set of uueríes is designed to isolate datas 
processing time. Іп tais set, tne yueries return tne same 
amount ot data trom relations of the same pnysital size 
(1.°., identical storage requirements) but naving a citfer- 
number of tuples. Inis ре Цвазцтененр от 5 
MBeocessing required relative to tne number of tuples DTO- 
cessed. Ihe query processing time, the date access time, 
ang tne data return time are tne same. 

БР ourth Set of queries оголе ао шапе ои 
query processing time. For operations on relations of any 
Seeniticent size, ІПІс 16 Bard to measure. Even on smell 
Nela 10ns, it may not be significant compared то simples 
SUM Overaead. TRUS Sel OT UCTS IS MOTS como lea ш 
tha provious sets. The queries are constructed to allow tne 
росе о (пе тіле elements (i.e., the three just measured: 
to be subtracted from the measurements, leaving only t^e 
query processing time. ConSsidorincuthesdirricullyphio pon 
talning a precise measurement of tne query processirg time, 
IO Oe. wortaallie to dersrmine tnis value весспеез 0 
esas] size, 

The previous discussion indicates that the query 


SEES 


{V 


re independent. However, with proper planning the 


auery sets nay be combined witn equivalent results. In tne 


9s 





graph shown in Fieure É, one set of experiments provides a 
Measurement of data access times and data return times. Tre 
Set also isolates the constant query Overhead (which 
includes the query processing time). 

Figure аз represents the response time of two Mames 
riss. One query selects five percent of tne tubles ani 
Mmeourns all of tne attribute fielas from вася tuple. The 
second query is identical except that it selerts ten percen 
pcne tuples. The yueries are both run against relations 
with i¢¢—byte tuples. The relations wary inesize trom ш Т 
tuples то 80,400 tuples. Point A on the grapn represents 
Mie percent selection on le; tuples. Forint F Treme- 
sents tne ten percert selection оп 5006 tuples. Since each 
ШЕ qguemes returns 500 tuples, TRE tine to return gne 
data is tne same. The overnead associated with each query, 
including query processing time, is the same. Tnerefore, 
thes difference between Ttne response times represented oy 
Points A and B is tne difference is tne access time and the 
puocessiDe Brame of the queries. Point А тре a mes 
Теу/е оп 110,204 tuples, which is 20% 010655 ОР 25565 
storage, Point B represents a retrieve on S¢el¢ tuples, or 
E FALSE blocks. A a SS тео ЕЕ 
Ее insignificant retetive to tae access time, woe. 


шо Ау 


++ 


H 


emenca in ne two response times 15 tno “time Sto 


access 259 disk blocks. 
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were. overnesad tor ail the queries shown on tre erepa 
q Uo ma and is represented бу the common intercept Cf 
tree Vertical aris. If the time represented by Point F іс 
а ИЕ ез Тот Тпе overhead and the tine to access 25V DIOCKS.; 
mem wee resuit 1s tne time to return 577 1@@—5уте tuples. 
Therefore, the use ot one query set nas identified rates For 
accessing data (in blocks per second) and returning data (іп 
bytes per second) 


8. "Prese Inree 


{сл 


ysten Respoase 


After tne time elements nave teen measured, a set dt 


O 


queries are performed which measure the effect of metnoas 
l FO improve the svstem response. Ап елаз рге т nie PS 
pee use of indexes. Tneoretically, the A ESE 
а improve system performance by decreasing the amouat 
of tata accessed. However, tne index must be accessea anc 
ро се5 566. Areas ОГ Перес поте по ее O 
waat point, if any, does the use of incexes become impor- 
tant. moere оге, pertoörmance OL Indeed “relat 1 ons ше 
measured over a wide range. What type of indek (i.e., clu- 
stered or non-=ciusterea) provides tne best performance ena 
what are the trade-offs? Woat scope of indices (i.e., one 
attribute, two, or more ) provides tne test pertormance? 
later question may be One dSpsendent on the application. 
e cae the RDM 11006, it nas oSéns noted ‘that, Yt. “tHe 
index is detined wnen the relation is being created, then 


ths size of a relation with a clustered index is lareer tren 


OL 





EI wot tne same relationtit Tue index 15 defined at ter 
ШЕ data has been entered into tne relation. This 15 te- 
ПЕ етпе Loading alzoritam assumes a normal distritution ort 
кеу values, while the data is іп кеу Sequence. data loaded 
mia o en scenerared aireedy sorted. 

kaaltıonae) тестіге should Ве его то Feet. +3 
“reel” of tne system. By becoming familiar with тле sy- 
sten s capabilities, tne testing personnel snould ве atle t> 
determine interesting lines of exverinentaticn. Teaser 
aT include tne overnead associated “with projection 
Clemmons, the use ct string Tompressicn techniques, and 
БИЕГЕ CIENCY of join operations (in different суа ые 
memory configurations, wnen available). 

4. Pnase four 2 Verification 

The last phase takes place after tne other tests 
Boe ween reviewed and erapned. Analysis of tne previous 
ЕЕ 5ЛОЦІП provide some mean:nsetul results about System 
swm formance in zeneral, апі іп particular areas. Tre veri- 
аг phase serves to perform tests which yerity “or 
disprove tne analysis of tne previous tests. It also Ton 
ETS Oportunity to reaa any tests which appecr errors 
ous or suspicious. Іп this pnase, additional tests may taxe 
ет ог tne flexiblilty desisned into" theon^svy 27387 55 


database. 





ASIN MAR Y 

Investieation or the performance of several conrisura- 
moms и аз back@nd гетар опа database machine nat proviced 
considerable insieht into what may be a sound basis for 
Beneral performance testing on relational YBVSs. In tais 
thesis, a metnodoloey has been laid out and tne initial 
несесе 55 такоп ln inat mevrnodoloery nave been derined. 3 
Complete framework for subsequent phases nas not Feen fully 
developed, out their contents nave teen (415745524. 45nile 
EDEN 5sNsUdescribed relate tc a specific series of zejetior,z 
MA dateabase macnines, the basic metnodoliogy may apply то 


relational database machines. 





APPENDIX A 


T Databases ~ catalog oť databases in tne system 
2% Disks - list of disKks known to system 
2: Base y used cy IDM tor concurrency control 


4. СОПЫ го =- information anot Seria and рага 121 ins 
CoN CSS, Checkpoint interval 


>> ОКШ ат = information a650ut current eetivity in the 
IDM 


Database Tables 


M Relation - catalog of all objects (relation, view, 
stored command) in tne database 


m Attribute - catalog of 2acn attribute Of eacn relation 

Sio Deces — catalog mo indices that existe in tae dam ease 

4. БИ ОСТ - Catalog ot protectiom lintprusiion in thema 
vabase 


S; Query ~ stored comnards and view 


9. Dresspyeference ~ сатаое о dependencies amos relations, 
Views and stored commands 


8 Transact = transaction loezine relation 


5. Us 


(0 


rs — mapping of user ana Zround names to user IL 

9. Во Есте mappine from St ILC ard usen to ГІШ 
10. Bloczalloc - catalog of disk blocks 

TS ESI — database allocation 

Ica тетрогагу transaction logging relation 


ШОЮ БЕЯ: 015 = user ietinable descriptions 





PROGRAM GR2014; 


APPENDIX B 


Database Generator Program (CMS 
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